A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Rajakumari, Dr. D.
- Effective Feature Selection Method for Cervical Cancer Dataset Using Data Mining Classification Analytical Model
Authors
1 Department of Computer Science, Nandha Arts and Science College, Erode, Tamilnadu, IN
Source
Digital Image Processing, Vol 11, No 12 (2019), Pagination: 208-214Abstract
Data mining is a set of techniques which could be used to derive hidden patterns from the data. The purpose of data mining is to find some information which is not directly visible or retrievable by reading data or executing simple queries to the data. One of the key features of using data mining techniques is to predict future based on the data of past and present. Predictions are widely required to be done for betterment of future. An accurate and timely prediction could avoid any future issue at a certain level. Healthcare is a field where it is required to diagnosis various critical diseases like cancers before they become life threatening. This paper explains how data mining techniques could be useful for healthcare purpose specially to predict possibility of a patient suffering from cervical cancer. The main goal here is to design a database which can be used in future for data mining purpose. In this paper implemented a feature model construction and comparative analysis for improving prediction accuracy of cervical cancer patients in four phases. In first phase, min-max normalization algorithm is applied on the original cervical cancer patient datasets collected from UCI repository. In cervical cancer dataset prediction second phase, by the use of feature selection, subset (data) of cervical cancer patient dataset from whole normalized cervical cancer patient datasets is obtained which comprises only significant attributes. Third phase, classification algorithms are applied on the data set. In the fourth phase, the accuracy will be calculated using ischolar_main mean square value, ischolar_main mean error value. KNN and SVM algorithm is considered as the better performance algorithm after applying feature selection. Finally, the evaluation is done based on accuracy values. Thus outputs shows from proposed GA base feature extraction with classification model implementations indicate that KNN and SVM algorithm performances all other classification algorithm with the help of feature selection with an accuracy of 97.60%.
Keywords
Cervical Cancer dataset, Data Mining Algorithm, KNN, SVM- Pearson Correlation Coefficient k-Nearest Neighbor Outlier Classification on Real-Time Data Set
Authors
1 Department of Computer Science, Nandha Arts and Science College, Erode, Tamilnadu, IN
Source
Programmable Device Circuits and Systems, Vol 12, No 1 (2020), Pagination: 1-7Abstract
Detection and classification of data that do not meet the expected behavior (outliers) plays the major role in wide variety of applications such as military surveillance, intrusion detection in cyber security, fraud detection in on-line transactions. Nowadays, an accurate detection of outliers with high dimension is the major issue. The trade-off between the high-accuracy and low computational time is the major requirement in outlier prediction and classification. The presence of large size diverse features need the reduction mechanism prior to classification approach. To achieve this, the Distance-based Outlier Classification (DOC) is proposed in this paper. The proposed work utilizes the Pearson Correlation Coefficient (PCC) to measure the correlation between the data instances. The minimum instance learning through PCC estimation reduces the dimensionality. The proposed work is split up into two phases namely training and testing. During the training process, the labeling of most frequent samples isolates them from the infrequent reduce the data size effectively. The testing phase employs the k-Nearest Neighborhood (k-NN) scheme to classify the frequent samples effectively. The dimensionality and the k-value are inversely proportional to each other. In proposed work, the selection of large value of k offers the significant reduction in dimensionality. The combination of PCC-based instance learning and the high value of k reduces the dimensionality and noise respectively. The comparative analysis between the proposed PCC-k-NN with the conventional algorithms such as Decision Tree, Naïve Bayes, Instance-Based K-means (IBK), Triangular Boundary-based Classification (TBC) regarding sensitivity, specificity, accuracy, precision, and recall proves its effectiveness in OC. Besides, the experimental validation of proposed PCC-k-NN with the state-of art methods regarding the execution time assures trade-off between the low-time consumption and high-accuracy.